AITopics | natural policy gradient primal-dual method

Collaborating Authors

natural policy gradient primal-dual method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Review for NeurIPS paper: Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes

Neural Information Processing SystemsJan-24-2025, 21:36:16 GMT

Strengths: Comments about the paper: This paper presents convergence analysis of primal-dual natural policy gradient methods under the CMDP framework. Several recent works have shown convergence of policy gradients and optimality bounds (e.g Agarwal et al., Mei et al), but the paper extends similar analysis to (a) natural policy gradients (b) CMDP framework with constraints. Overall, it archives a sublinear rate of convergence in the CMDP framework, similar to other related works with convergence analysis. The analysis of the paper is done for the general MDP case with function approximation and restricted policy classes. It is a very well written paper that is easy to follow with significant theoretical derivation and proof details.

constrained markov decision process, convergence analysis, natural policy gradient primal-dual method, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Review for NeurIPS paper: Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes

Neural Information Processing SystemsJan-24-2025, 21:36:10 GMT

After reading the authors' rebuttal, the reviewers discussed their concerns about this paper. Ultimately, a consensus was not reached asreviewer #1 feels that the issues raised in her/his review were not properly addressed in the authors' feedback. The other reviewers also share some of the concerns raised by reviewer #1, but, given the rebuttals, they believe the authors can fix them in the final version and make the contribution of their paper clearer. I agree with them and so I suggest to accept the paper, but I recommend that the authors take into consideration the issues raised in the reviews and address them carefully in the final version of the paper.

constrained markov decision process, natural policy gradient primal-dual method, neurips paper, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Decision Support Systems (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes

Neural Information Processing SystemsOct-10-2024, 08:00:10 GMT

We study sequential decision-making problems in which each agent aims to maximize the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon Constrained Markov Decision Processes (CMDPs) problem. Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method for CMDPs which updates the primal variable via natural policy gradient ascent and the dual variable via projected sub-gradient descent. Even though the underlying maximization involves a nonconcave objective function and a nonconvex constraint set under the softmax policy parametrization, we prove that our method achieves global convergence with sublinear rates regarding both the optimality gap and the constraint violation. Such a convergence is independent of the size of the state-action space, i.e., it is dimension-free.

constrained markov decision process, convergence, natural policy gradient primal-dual method, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.61)

Add feedback

Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs

Ding, Dongsheng, Zhang, Kaiqing, Duan, Jiali, Başar, Tamer, Jovanović, Mihailo R.

arXiv.org Artificial IntelligenceOct-17-2023

We study sequential decision making problems aimed at maximizing the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon optimal control problem for Constrained Markov Decision Processes (constrained MDPs). Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method that updates the primal variable via natural policy gradient ascent and the dual variable via projected sub-gradient descent. Although the underlying maximization involves a nonconcave objective function and a nonconvex constraint set, under the softmax policy parametrization we prove that our method achieves global convergence with sublinear rates regarding both the optimality gap and the constraint violation. Such convergence is independent of the size of the state-action space, i.e., it is dimension-free. Furthermore, for log-linear and general smooth policy parametrizations, we establish sublinear convergence rates up to a function approximation error caused by restricted policy parametrization. We also provide convergence and finitesample complexity guarantees for two sample-based NPG-PD algorithms. Finally, we use computational experiments to showcase the merits and the effectiveness of our approach. Keywords: Constrained Markov decision processes; Natural policy gradient; Constrained nonconvex optimization; Method of Lagrange multipliers; Primal-dual algorithms.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2206.02346

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Maryland > Prince George's County > College Park (0.14)
(6 more...)

Genre: Research Report (1.00)

Industry:

Education (0.45)
Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.86)

Add feedback